Group-189

Keesari Shravya - 2020FC04582

Abhijith K.S. - 2020FC04193

Pranesh V - 2020FC04961

SMS Spam Collection Dataset

a) Download the file and set it as a Dataframe.

Exploratory Data Analysis (EDA)

Creating Corpus

b) Remove punctuations, special characters and stopwords from the text in ‘sms’ column. Convert the text to lower case.

c) Create two objects X and y. Create a CountVectorizer object and split the data into training and testing sets. Train a MultinomialNB model and Display the confusion Matrix

Model building

Multinomial NB model

Classification Report for Train & test data

Confusion Matrix for Train Data

Confusion Matrix for Test Data

d) Display the POS tagging on the first 4 rows of ‘sms’.

e) Build and display a dependency parser tree for the sentence :

Showing POS tagging for each word in the given sentence